Connected Digit Recognition over Long Distance Telephone Lines using the SPHINX-II System

نویسنده

  • Uday Jain
چکیده

4 Introduction 5 1.1 Large vocabulary vs. small vocabulary tasks . . . . . . . . . . . . . . . . . . . . 5 1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The SPHINX II System 8 2.1 Overview of SPHINX II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Hidden Markov Models (HMMs) . . . . . . . . . . . . . . . . . . . . . . . . . 8 Recognition Unit 9 2.3 The SPHINX-II Trainer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Acoustical Feature Extraction 10 Lexical Feature Extraction 10 Context-Independent System Training 11 Segmentation and Senonic Clustering 11 Context-Dependent System Training 12 2.4 Decoder. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 The Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Bootstrapped Training 13 Acoustic Feature Extraction 13 Lexical Feature Extraction 13 Segmentation 13 Creation of CI-DHMMs 13 Creation of the Senonic Decision Trees and Mapping Table 14 Creation of CD-SCHMMs 14 Fine Tuning the CD-SCHMMs 14 Training from Scratch 15 Acoustic Feature Extraction 15 Lexical Feature Extraction 15 Creation of CI-SCHMMs 15 Segmentation and the Creation of Senonic Decision Trees and Mapping Table 15 Creation of CD-SCHMMs 16 Fine Tuning the CD-SCHMMs 16 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Speech Corpora 17 3.1 MALL Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.2 Other Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Reduced Bandwidth AN4 19 Filtered WSJ0+WSJ1 19 Macrophone 20 3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Performance of Existing Systems 21 4.1 Reduced Bandwidth AN4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Filtered WSJ 1PD models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3 Macrophone models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Bootstrapped Training 25 5.1 Filtered WSJ models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5.2 Macrophone models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Data-Driven Training 28 6.1 MALL 88 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Context-Independent Semi-Continuous HMMs 29 Context-Dependent Semi-Continuous HMMs 30 6.2 MALL91. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Context-Independent Semi-Continuous HMMs 32 Context-Dependent Semi-Continuous HMMs 32 Context-Dependent Semi-Continuous HMMs 34 6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Word-Based Systems 36 7.1 System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 7.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 The MALL88 Database 38 Context-Independent Semi Continuous HMMs 38 Context-Dependent Semi-Continuous HMMs 39 Further Processing 40 The MALL91 Database 40 Context-Independent Semi-Continuous HMMs 40 Context Dependent Semi-Continuous HMMs 41 Further Processing 42 7.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Towards a Word-based System using an Approximation to Continuous HMMs 43 8.1 Training using Multiple Gaussian Set (MGS) . . . . . . . . . . . . . . . . . . . . 44 8.2 Gender-dependent training . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Environmental Adaptation 47 9.1 CDCN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 9.2 Cross Environment Normalization. . . . . . . . . . . . . . . . . . . . . . . . . 47 MALL88 48 MALL91 49 9.3 Test set Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Conclusion 52 10.1 Training models closer to the training data . . . . . . . . . . . . . . . . . . . . . 53 10.2 Making the system completely digit oriented . . . . . . . . . . . . . . . . . . . . 53 10.3 Increased models size and reduced parameter sharing . . . . . . . . . . . . . . . . 54 10.4 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 10.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Power variance training 55 Silence removal 55 References 56

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sources of degradation of speech recognition in the telephone network

In this paper we compare speech recognition accuracy for highquality speech recorded under controlled conditions with speech as it appears over long-distance telephone lines. In addition to comparing recognition accuracy, we use telephone-channel simulation to identify the sources of degradation of speech over telephone lines that have the greatest impact on speech recognition accuracy. We firs...

متن کامل

Continuous Recognition of Large-vocabulary Telephone-quality Speech

The problem of speech recognition over telephone lines is growing in importance, as many near-term applications of spoken-language processing are likely to involve telephone speech. This paper describes recent efforts by the CMU speech group to improve the recognition accuracy of telephone-channel speech, particularly in the context of the 1994 ARPA common Hub 2 evaluation of speech over long-d...

متن کامل

基於Sphinx 可快速個人化行動數字語音辨識系統 (Quickly Personalizable Mobile Digit Speech Recognition System Based on Sphinx) [In Chinese]

In this paper, we introduce a system for on-line digit speech recognition services. Besides the speech recognition service in our system, we also provide adaptation function to improve the noise-robustness between different environment. In the case of English digit recognition, our recognition system can achieve over 80% accuracy for a specific speaker by using a few adaptation data. We use Sph...

متن کامل

Connected Digits Recognition Task: ISTC–CNR Comparison of Open Source Tools

EVALITA is a recent initiative devoted to the evaluation of Natural Language and Speech Processing tools for Italian. In this work, the results of three open source ASR toolkits will be described. CSLU Speech Tools, CSLR SONIC, CMU SPHINX are applied on the EVALITA clean and noisy digits recognition task and this report will describe the complete evaluation methodology. CSLR SONIC has resulted ...

متن کامل

Digit Recognition Using the SPEECHDAT Corpus

With the remarkable evolution of telecommunications as we reach the end of this century, it becomes clear that speech recognition via the telephone network will play an increasingly important role, mainly due to the widespread use of both cellular and non-cellular telephones. For many applications of speech recognition over the telephone, digit recognition is fundamental. This paper describes a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996